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Abstract 

We have recently shown that the human Nuclear pore-associated protein {NPAP1)/C1 5orf2 gene encodes a nuclear pore-associated 
protein. This gene is one of several paternally expressed imprinted genes in the genomic region 15q1 1q13. Because the Prader-Willi 
syndrome is known to be caused by the loss of function of paternally expressed genes in 1 5q1 1 q1 3, a phenotypic contribution of 
NPAP1 cannot be excluded. NPAP1 appears to be under strong positive Darwinian selection in primates, suggesting an important 
function in primate biology. Interestingly, however, in contrast to all other protein-coding genes in 1 5q1 1 q1 3, NPAPl has noortholog 
in the mouse. Our investigation of the evolutionary origin of NPAPl showed that the gene is specific to primate species and absent 
from the 1 5q1 1 q 1 3-orthologous regions in all nonprimate mammals. However, we identified a group of paralogous genes, which we 
call NPAP1L, in all placental mammals except rodents. Phylogenetic analysis revealed that NPAP1, NPAP1L, and another group of 
genes {UPF0607), which is also restricted to primates, are closely related to the vertebrate transmembrane nucleoporin gene 
P0M121 , although they lack the transmembrane domain. These three newly identified groups of genes all lack conserved introns, 
and hence, are likely retrogenes. We hypothesize that, in the common ancestor of placentals, the P0M12 1 gene retrotransposed and 
gave rise to an A/P/APy -ancestral retrogene NPAPl L/NPAPl /UPF0607. Our results suggest that the nuclear pore-associated gene 
NPAP1 originates from the vertebrate nucleoporin gene P0M12 1 and — after several steps of retrotransposition and duplication — has 
been subjected to genomic imprinting and positive selection after integration into the imprinted SNRPN-UBE3A chromosomal 
domain. 
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Background 

Genomic imprinting is a process that regulates the expression 
of certain genes in a parent-of-origin-dependent manner and 
has evolved independently in plants and mammals (Feil and 
Berger 2007). Genes regulated by this epigenetic mechanism 
are thus expressed either from the maternal chromosome or 
the paternal chromosome only. Most imprinted genes are or- 
ganized in clusters, which are regulated by imprinting control 
regions (ICRs) that regulate the monoallelic expression of the 
genes in cis (Reik and Maher 1997; Ferguson-Smith 201 1). 

The human genomic region 1 5q1 1 q1 3 contains a cluster of 
imprinted genes with several paternally only expressed genes 



and one maternally only expressed gene (fig. 1). It is regulated 
by an ICR that includes the promoter of the SNRPN gene 
(Buiting et al. 1995; Saitoh et al. 1996; Ohta et al. 1999; 
Horsthemke and Buiting 2008). Loss of function of the pater- 
nally expressed genes in this region, most commonly arising 
through a -6 Mb deletion on the paternally inherited chromo- 
some, leads to Prader-Willi syndrome with neonatal muscular 
hypotonia and failure to thrive, childhood-onward hyperpha- 
gia and obesity, and mild-to-moderate intellectual disability 
(Cassidy 1997; Butler and Palmer 1983; Buiting 2010). 
We have recently shown that the paternally expressed gene 
C15orf2 encodes a nuclear pore complex (NPC) associ- 
ated protein, and it was therefore renamed to nuclear 
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Fig. 1. — ^The imprinted gene cluster in the human genomic region 1 5q1 1q13. The region, shown from centromere to telomere, contains a number of 
genes expressed from the paternal chromosome only (blue) and the maternally only expressed gene UBE3A (red). The direction of transcription is indicated by 
arrowheads. Nonexpressed alleles are depicted in gray on the repressed allele. Biallelically expressed genes are depicted in black. The complex SNURF/5NRPN 
locus has several alternative transcription start sites and encodes two proteins (SNURF and SNRPN), several snoRNAs, and a UBE3A-an^\sense RNA. 
The existence of a continuous SNRPN transcript (blue arrow) containing upstream and downstream parts has, however, not yet been experimentally 
documented. The figure shows the gene expression according to fetal brain and is only approximately drawn to scale. 



pore-associated protein 1 {NPAP1) (Neumann et al. 2012). 
The exact function of the protein is not known, but we sus- 
pect a brain-specific NPC-related function because the protein 
is expressed in several human brain regions (Wawrzik et al. 
2010). Several studies showed that NPAPl underwent strong 
positive Darwinian selection in the primate lineage (Nielsen 
et al. 2005; Kosiol et al. 2008; Wawrzik et al. 2010). 

The imprinted domain in 15q11q13 has assembled rela- 
tively recently during mammalian evolution from before un- 
linked and nonimprinted components (Rapkins et al. 2006). 
The region evolved its imprinted regulation after fusion of 
two nonimprinted regions that contained SNRPN and 
UBE3A, respectively, 105-180 Ma (Rapkins et al. 2006). 
Other intronless genes, such as MKRN3, MAGEL2, and 
A/DA/ integrated independently into this genomic region 
after fusion of SNRPN and UBE3A, most probably by retro- 
transposition (Gray et al. 2000; Chai et al. 2001; Rapkins 
et al. 2006). It was hypothesized that at least some of the 
retrogenes integrated into the region after the evolution of 
imprinting in 15q11q13 and acquired their imprinted regu- 
lation subsequently (Chai et al. 2001 ; Rapkins et al. 2006). In 
the well-described murine orthologous region, two rodent- 
specific imprinted genes, Frat3/Peg12 and Atp5l-psl , have 
been identified, suggesting that the process of gene acqui- 
sition is still ongoing and leads to divergent imprinted gene 
sets in the primate and rodent lineage (Chai et al. 2001). 
Here, we report that NPAPl is a primate-specific gene that 
entered the imprinted region 15q1 1q13 by duplication from 
an ancestral paralog on human chromosome 9 during 
primate evolution. The ancestral gene, NPAP1L, in turn is 
derived from retrotransposition of P0M121 in an ancestor 
of placentals. 



Materials and Methods 

Bioinformatic Tools 

NPAP1 homologous gene sequences were found using the 
"orthologs" list in the EnsembI database (www.ensembl. 
org, last accessed January 20, 2014) or using the Blast-like 
alignment tool (BLAT) in the University of California, Santa 
Cruz (UCSC) genome browser (http://genome.ucsc.edu/cgi- 
bin/hgBlat?command=start, last accessed January 20, 2014). 
Gene and protein alignments were produced using the 
ClustalW multiple alignment tool included in the Geneious 
Pro 5.6 package (Biomatters Ltd, Auckland, New Zealand) 
with standard settings. The prediction of intronless ORFs 
was also carried out with Geneious Pro 5.6 or the ORF 
finder in National Center for Biotechnology Information 
(NCBI) (http://www.ncbi.nlm.nih.gov/gorf/gorf.html, last 
accessed January 20, 2014). Exon prediction was performed 
with the software GENSCAN (http://genes.mit.edu/GENSCAN. 
html, last accessed January 20, 2014) and standard settings. 

Sequence Analyses 

Marmoset NPAP1 sequencing was performed with Big Dye 
Terminators (BigDye Terminator v1.1 Cycle Sequencing Kit, 
Life Technologies, Darmstadt, Germany) and the cycle se- 
quencing procedure. Products were analyzed with an ABI 
3100 Genetic Analyzer and Sequencing Analysis software 
(Life Technologies). The following primers were used: 
Marmoset_NPAP1 _fw1 : AAACACCCCAGCTCCGTGAGGA; 
Marmoset_NPAP1_rev1 : GGATGGGCTGGGAAGTTGTGGC; 
Marmoset_NPAP1_fw2: CACAACAGGCCCTGC/VVV\GGA; 
Marmoset_NPAP1_rev2: CCCCATGTAAAACGGGAGGCAC; 
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Marmoset_NPAP1_fw3: ATCCAATTCTGGGGCTCTTG; 
Marmoset_NPAP1_rev4: TCCAAGGTGCCCAGGTCTC. 

Expression Analyses 

For expression analysis of NPAP1L, we used a custonnized 
panel of cDNAs from bovine tissues (caudate nucleus, cere- 
bellum, cerebral cortex, hippocampus, hypothalamus, kidney, 
liver, placenta, skeletal muscle, testis) and DNA from bovine 
testis as a control (Zyagen, San Diego, USA). Polymerase chain 
reaction (PGR) assays were designed such that they spanned 
introns from the EnsembI gene prediction. For intron 1 
(1,067 bp), we used the primers Cow-NPAP1L-in1-fw2: TAA 
CTATCCCTTTGACTCCCGA and Cow-NPAP! L-in1-rev2: CTG 
GAGCATAGATAACTGCCAA. For intron 2 (17 bp), we used 
the primers Cow-NPAP1L-in2-fw: CAAGCCTCAACTTATTTG 
CCTG and Cow-NPAP1L-in2-rev: TGGCAAACCTGAATCCAT 
TTTG. Representative products were sequenced using 
Applied Biosystems BigDye Terminator vl.1 Cycle 
Sequencing Kit (Life Technologies) and an ABI PRISM 3100 
Genetic Analyzer (Life Technologies). The control PGR on 
bovine ATCB was performed with primers Fw2-beta-Actin- 
Cow: GGCACCCAGCACAATGAAGA and Rev2-beta-Actin- 
Cow: CGACTGCTGTCACCTTCACCG. 

Phylogenetic Analyses 

BlastP searches at NCBI (http://blast.ncbi.nlm.nih.gov, last 
accessed January 20, 201 4) were performed using the peptide 
sequence of NPAP1 as query. The ML tree based on the 
JTT+r4 amino acid substitution model was calculated using 
the MEGA5 program (Tamura et al. 201 1) and the embedded 
alignment software MUSCLE (Edgar 2004), with a data set 
including all obtained sequences plus homologs identified by 
EnsembI. 

Results and Discussion 

Conservation of NPAP1 Orthologs in Primates 

EnsembI (release 69; http://www.ensembl.org, last accessed 
January 20, 2014; [Hubbard et al. 2009]) contains NPAPl 
orthologs in the chimpanzee {Pan troglodytes), gorilla 
{Gorilla gorilla), orangutan {Pongoabelii), and rhesus macaque 
{Macaca mulatta). In an effort to identify additional NPAPl 
orthologs in the 1 5q1 1q13 orthologous regions of other pri- 
mate species, we searched the UCSC genome browser 
(http://genome.ucsc.edu, last accessed January 20, 2014; 
[Kent 2002]) using the BLAT Kent et al. [2002]) with the 
human gene sequence as query. This search resulted in the 
identification of NPAP1 orthologs in the gibbon {Nomascus 
leucogenys), the squirrel monkey {Saimiri boliviensis), and 
the marmoset {Callithrix jacchus). But in the genome of the 
bushbaby {Otolemur garnettii), a member of the more dis- 
tantly related suborder of Strepsirrhini (fig. 2), we could not 
find an A/P/\P/ -homologous sequence in the 15q11q13- 



orthologous region. The genomes of baboon {Papio anubis), 
tarsier {Tarsius syrichta), and mouse lemur {Microcebus mur- 
inus) are not sufficiently assembled to determine syntenic 
relationships. 

Of all analyzed primates, the marmoset and the squirrel 
monkey, members of the Haplorhini parvorder Platyrrhini, 
were the most distantly related species that already have 
orthologs of NPAPl (fig. 2). An alignment showed that the 
marmoset NPAPl gene on chromosome 6 contains a -2.5 kb 
gap in the part that aligns with the human open reading 
frame (ORE). We thus sequenced the missing part of the 
gene in the marmoset and obtained a complete gene se- 
quence (supplementary table SI, Supplementary Material 
online), that is, 69.9% identical with the human NPAPl. In 
contrast to NPAPl orthologs of Catarrhini species, the mar- 
moset NPAPl lacks a long intronless ORE. GENSCAN (http:// 
genes.mit.edu/GENSCAN.html, last accessed January 20, 
2014, [Burge and Karlin 1997]) predicts a 3.06 kb ORE with 
two small introns that starts 164 bp upstream of the human 
ORE and ends parallel at an apparently homologous position 
(supplementary table SI, Supplementary Material online). The 
second and third exon of the GENSCAN prediction contain six 
deletions when compared with the Catarrhini genomes that 
are multiples of 3 bp, and up to 39 bp long, so would appear 
to be in frame and coding. By contrast, the upstream part of 
the gene contains a number of small indels that are not mul- 
tiples of three, suggesting that this region is not part of the 
ORE and might have become partially pseudogenized. The 
indel pattern suggests a balancing selection against frame- 
shifts inside the coding sequence, as 1 bp indels occur with 
approximately ten times higher probability than 3 bp indels (de 
la Chaux et al. 2007). This argues for a protein coding function 
of marmoset NPAPl despite the absence of the expected 
intronless ORE. As NPAPl orthologs were found in all analyzed 
members of Haplorhini but not in any members of 
Strepsirrhini (fig. 2), the gene presumably integrated into the 
15q11q13 orthologous region after the two primate subor- 
ders diverged about 60-70 Ma (Springer et al. 2012). 

A New Family of NPAPl Homologous Sequences in 
Placental Mammals 

In addition to the aforementioned primate NPAPl genes, 
EnsembI (release 69) also contains apparently homologous 
genes from dog (ENSCAFG00000014649 and 
ENSCAFGOOOO 0023307), cow (ENSBTAG00000046462), 
pig (ENSSSCG00000022532), and elephant 
(ENSLAFG00000014287 and ENSLAFG00000031 777) that 
were annotated as NPAPl orthologs. In light of the known 
absence of NPAPl genes from the murine orthologous region, 
this annotation and phylogenetic distribution seemed to con- 
tradict the traditional view of mammalian evolution, which 
places rodents closer to humans than ruminants and carni- 
vores (e.g., Springer et al. 2003). Alternatively, but less 



346 Genome Biol. Evol. 6(2):344-351 . doi:10.1093/gbe/evu019 Advance Access publication January 29, 2014 



NPAPl is a POM 72/ -Related Retrogene 



GBE 




B 



Bushbaby 



Tarsier 



Marmoset 



Rhesus macaque 



Great apes 

(human^ chimpanjseeKorangutan) 



Human 

Chimpanzee B»w H 

Orangutan BRi » 

Rhesus wm m 

Marmoset i» a 

Taraier NPAP1 orthofog unknown 

Bushbaby No NPAP1 ortholog 



7TCAAGCC TCCCG -TCAC AflGGG AG 
rrCAMJCCTCCCG TCACMGGG AG 
ITCAAGCC TCCTG TC.!^- AAGG? AG 

24-bp tn-^me deletion 




Fig. 2. — NPAPl conservation in primates. (A) Cladogram showing the family structure of primates and the relationship of the analyzed primate species. 
NPAPl orthologs were found in all analyzed members of the parvorders Platyrrhini and Catarrhini (red). (B) Detailed view of an alignment of selected primate 
NPAPl orthologs. The figure shows a well-conserved region inside the human ORF (1,965-2,062 bp from NCBI reference sequence NM_01 8958.2) that 
contains one of the numerous in-frame deletions found in the marmoset sequence. 



parsimoniously, the nonprimate homologs might be of differ- 
ent origin. Therefore, we investigated the conservation of 
synteny among the EnsembI orthologs more closely. Using 
the UCSC genome browser, we found that only primate 
NPAP1 genes share synteny with UBE3A and SNRPN, whereas 
the homologous genes in other mammals are located be- 
tween their respective orthologs of transducin-like enhancer 
of split {TLB) 1 and TLE4 (fig. 3). We refer to the group of 
orthologous genes located in this synteny group as nuclear 
pore-associated protein 1-like {NPAPl L). TLE1 and 4 belong 
to a larger region of well-conserved synteny including ortho- 
logs of the human genes GNA14, GNAQ, PSAT1, TLE4, TLEl 
RASEF, and FRMD3 (centromeric to telomeric on human 
chr9q21). Because the complete synteny group can also be 
found in chicken, which does not contain NPAP1L, it is likely 
that NPAP1L integrated into this region as a single gene during 
mammalian evolution. A possible mechanism for this would 
be retrotransposition (Kaessmann 2010). 

EnsembI (release 69) also contains genes from three addi- 
tional mammalian species (cat, ENSFCAG00000010320; tree 
shrew, ENSTBEG00000007225; ferret, ENSMPUG000000194 
48) that were annotated as NPAPL The current quality of 
the genome assemblies of these species does not allow us 
to conduct robust synteny analyses. However, cat 
scaffold GL897178.1 contains both TLE1 and ENSFCAGOOOO 



0010320, suggesting that ENSFCAG00000010320 is another 
ortholog of NPAP1L. All Ensembl-annotated NPAP1L ortho- 
logs were predicted to be protein coding by the EnsembI pipe- 
line (Potter et al. 2004). 

Using the UCSC genome browser and the BLAT algorithm 
with human NPAP1 as query, we found NPAP1L sequences in 
the genomes of additional mammals, for example, horse and 
rabbit. We could, however, not identify significant hits in 
either rodents or marsupials. In the genomes of mouse and 
rat, TLE4 (on murine chr.19) and TLE1 (on murine chr. 4) have 
lost their syntenic position, creating the impression that 
NPAPIL might have been lost in their common ancestor 
during chromosomal rearrangements. In the guinea pig 
genome, however, the TLE4-TLE1 synteny group is found on 
scaffold 21 although NPAPIL was not identified, rather sug- 
gesting that NPAP1L has been lost in rodents before chromo- 
somal rearrangements in this region. In the human genome, 
our search revealed two copies of NPAP1L in opposite orien- 
tations on human chromosome 9 that match Ensemble- 
annotated processed pseudogenes (ENSG00000238002 
on the plus strand and ENSG00000236521 on the minus 
strand). The same arrangement of NPAP1L genes was also 
observed in other primate species (fig. 3). Both human 
NPAPIL copies contain only short ORFs, which code for pep- 
tides without significant amino acid sequence similarity to 
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Fig. 3. — Synteny conservation in mammals. NPAP1 and NPAPIL orthologs lie in two different synteny groups, which are orthologous to human 
1 5q 1 1 q 1 3 or 9q2 1 , respectively. The arrows show the orientation of genes from 5' to 3' . Mutations leading to pseudogenization of primate NPAPIL genes 
are represented by stars. Because the relative orientations of the two neighboring NPAPIL copies differ between primates and dog/elephant, these 
duplications do not appear to have a common origin. 



NPAP1 and NPAP1L proteins, confirnning their annotation as 
processed pseudogenes. As is the case for the human 
genome, also the genomes of dog and elephant each contain 
two highly similar copies of NPAP1L with opposite orienta- 
tions. However, in contrast to dog and elephant, the 
human NPAPIL copies are tail-to-tail oriented, suggesting 
that NPAP1L tandem-duplicated several times independently 
(fig. 3). 

Expression Analysis of NPAP1L 

As mentioned earlier, all nonprimate NPAP1L orthologs were 
predicted to be protein coding by Ensembl. In an effort to 
verify the in silico prediction in a nonprimate species, we ana- 
lyzed the expression of bovine NPAP1L. To this end, we used 
a bovine cDNA panel (Zyagen) and two PGR assays span- 
ning the two introns of the Ensembl-predicted gene 
(ENSBTAG00000046462). As we had shown before that 
human NPAP1 is expressed in testis and brain (Farber et al. 
2000; Wawrzik et al. 201 0), we included cDNA from different 
bovine brain regions and testis into our panel. With both PGR 
assays, we observed expression of bovine NPAP1L in four of 
five analyzed brain regions: caudate nucleus, cerebellum, hip- 
pocampus, and hypothalamus. With one of the two assays, 
we also obtained weak signals for the cerebral cortex, kidney, 
and testis, but we did not observe expression in liver, placenta, 
or skeletal muscle (data not shown). Guriously, by gel analysis 
and Sanger sequencing, the obtained products from cDNA did 
not correspond to mRNA of the expected splicing pattern but 
were colinear with genomic DNA. 

Although our cDNA panel had been obtained from a 
commercial source (Zyagen) that tests for residual DNA con- 
tamination as part of their quality control procedure, the 



unexpected finding of colinear RNA expression prompted us 
to double-check for contamination with genomic DNA. To this 
end, we used an intron-spanning RT-PGR for the ACTB locus 
that gives rise to a 1 3 1 -bp larger product when genomic DNA 
is amplified. In all cDNA samples, we only obtained the prod- 
uct that is expected from ACTB cDNA, making a contamina- 
tion of the cDNAs with genomic DNA very unlikely. We 
conclude that NPAPIL is expressed in the cow, primarily in 
the brain, but that the splicing pattern predicted by Ensembl 
is not correct in any of the analyzed tissues. Alternatively, it is 
possible that a 2,688-bp intronless ORF is expressed in the cow 
and would lead to a shorter NPAP1 -homologous protein of 
895 amino acids. Even if the bovine NPAP1L ortholog should 
not be protein coding, this would not per se exclude protein- 
coding functions for other mammalian NPAPIL orthologs. 

NPAP1 and NPAPIL Belong to a P0M/2/-Related 
Gene Family 

Because functional NPAP1L sequences could be found in all 
analyzed mammals except rodents, whereas NPAP1 is unique 
to primate species, we hypothesized that NPAPl originates 
from the NPAPIL gene locus. In order to investigate this hy- 
pothesis in an unbiased way, we performed a BlastP (http:// 
blast.ncbi.nlm.nih.gov, last accessed January 20, 2014) search 
with the human NPAPl protein sequence as query. Sequences 
obtained in this search as well as Ensembl-annotated protein 
sequences were used for the inference of phylogenetic rela- 
tionships. A maximum likelihood (ML) tree (fig. 4) was recon- 
structed using representatives of major tetrapod taxa. This 
data set consisted of NPAPl amino acid sequences from dif- 
ferent primate species, of a group of proteins designated as 
UPF0607 that are also restricted to primates, of NPAPIL 
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Fig. 4. — ML tree of NPAPl -homologous proteins. Boot strap values (left) and associated support values (right) are given for all nodes that have bootstrap 
values above 50. The tree topology suggests that NPAPl, UPF0607, and NPAPl L are derived from the vertebrate-specific nucleoporin gene P0M121. 
NPAPl L seems to be ancestral to the primate-specific sister genes NPAPl and UPF0607. On the basis of branch lengths, it can be seen that the two 
neighboring NPAPl L copies {NPAPILa and NPAPl Lb) in dog and elephant are the result of two independent duplications. 
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proteins from elephant, dog, cat, pig, and cow, and of dis- 
tantly related P0M121 and P0M121-like proteins (supple- 
nnentary table S2, Supplementary Material online). 

The inferred ML tree suggests that the genes NPAP1L, 
NPAP1, and (7PF0607 form a monophyletic group that is de- 
rived from P0M121 (fig. 4). The most straightforward inter- 
pretation of the tree topology and the taxonomic distribution 
of genes is that NPAP1L duplicated in the lineage leading to 
primates, resulting in an NPAP1/UPF0607 gene. Before the 
primate radiation, this ancestral NPAP1/UPF0607 gene dupli- 
cated again to give rise to NPAPl and UPF0607. The primate- 
specific UPF0607 gene duplicated in a subset of primate 
species resulting in two groups of genes, namely UPF0607a 
and -b (see fig. 4 and supplementary table S2, Supplementary 
Material online). In dog and elephant, two copies of NPAP1L 
genes derived from recent independent duplication events are 
found, namely NPAPl La and -b (see fig. 4 and supplementary 
table S2, Supplementary Material online). In the lineage lead- 
ing to primates, the ancestral NPAP1L gene pseudogenized 
and is therefore not included in our phylogenetic analysis. 
However, remnants of this gene are still identifiable in all 
analyzed primate species. Taken together, the tree topology 
supports our initial hypothesis that NPAP1 is derived from the 
NPAP1L gene locus and adds P0M121 as a common ancestor 
of both groups of genes. 

Although the vertebrate nucleoporin gene P0M121 con- 
tains several large and highly conserved introns, all NPAPl and 



UPF0607 genes are intronless. NPAP1L genes are predicted to 
contain small introns; however, the intronic structure is not 
conserved between the orthologs and presumably evolved 
secondarily from an intronless ancestral gene. In light of 
these observations, the phylogenetic analysis suggests an evo- 
lutionary scenario wherein P0M121 duplicated via retrotran- 
sposition in the last common ancestor of placentals, giving rise 
to the intronless NPAP1L/NPAP1/UPF0607 retrogene (fig. 5). 
The first exon of P0M121 that includes its transmembrane 
domain was lost during or following retrotransposition, 
because none of the orthologs of NPAP1L, NPAP1, or 
UPF0607 are predicted transmembrane proteins. The loss of 
5^ gene parts is characteristic of gene duplications via retro- 
transposition (Ding et al. 2006). More recently, before the 
primate radiation, this gene duplicated or retrotransposed 
twice more giving rise to the genes NPAP1 and UPF0607. In 
dog and elephant, NPAPIL was subject to independent 
tandem duplications, whereas it was lost in the rodent lineage 
(fig. 5). 

Conclusions 

Our results show that the imprinted, primate-specific NPAP1 
gene originates from the vertebrate nucleoporin gene 
P0M121. It is part of a so far unrecognized gene family 
of P0/\///2 /-related retrogenes, the members of which 
can be considered possible candidates for mammal- and 
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primate-specific NPC-associated functions. Unlike P0M121, 
the predicted proteins lack a transmembrane domain and 
thus appear to have functionally diverged from their ancestral 
protein. 

Supplementary Material 

Supplementary tables S1 and S2 are available at Genome 
Biology and Evolution online (http://www.gbe.oxfordjour 
nals.org/). 
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